Works & Side Project

Reinforcement Learning for Autonomous Vehicles with CARLA Simulator

# Keywords : Robotics Reinforcement Learning CARLA Autonomous Driving

CARLA_RL

Autonomous vehicles (AVs) hold the promise of revolutionizing transportation, yet developing robust decision-making systems for complex urban navigation remains a significant challenge. This project addresses this by developing and evaluating a reinforcement learning (RL) framework for autonomous driving within the CARLA simulator. A Double Deep Q-Network (DDQN) agent, integrating a Convolutional Neural Network (CNN) for spatial feature extraction from camera imagery and a Long Short-Term Memory (LSTM) network for temporal understanding, was implemented from scratch. The agent was trained to select from six discrete driving actions. Initial training attempts relying on pure exploration proved insufficient. Consequently, a modified strategy incorporating CARLA's built-in autopilot as a guiding baseline was adopted, alongside a custom reward function encouraging both safety and driving speed. After 40,000 training frames, the final model demonstrated a significant improvement in driving performance compared to its untrained state and the baseline autopilot. The agent successfully navigated simulated urban environments, both with and without traffic, exhibiting smooth control, collision avoidance, and lane adherence. This work validates the efficacy of the CNN+LSTM DDQN approach for autonomous driving tasks in simulation and highlights the importance of guided learning strategies. The developed system provides a foundational step towards more sophisticated autonomous decision-making.

Introduction

With the rapid development of technology, autonomous vehicles provide a potential way to revolutionize transportation by improving safety, reducing traffic congestion, and enhancing mobility. However, developing autonomous vehicles (AVs) capable of safely and efficiently navigating complex urban environments is a significant challenge in both artificial intelligence and robotics fields. Traditional rule-based systems struggle to handle the vast variability and unpredictability inherent in real-world driving scenarios. Hence, reinforcement learning seems to be a possible solution to autonomous driving and is gaining popularity in the Robotics field. The primary problem addressed in this project is to develop a robust decision-making framework for autonomous vehicles using reinforcement learning (RL). Since real-world data is not easy to obtain, the whole system would be implemented in the CARLA simulator, a realistic open-source simulator for developing autonomous vehicles.

Approaches

The project consists of three main components: a simulator, the environment state, and an agent.

OverallProjectWorkflow



The simulator generates environmental data. The environment state provides input for the model, including observations and rewards. The agent is the reinforcement learning model that is trained. By taking observations, it generates actions which are control commands sent back to the car in the simulator.

> 1. CARLA Simulator

CARLA is an open-source simulator for autonomous driving research that provides a powerful Python API to control simulation aspects like traffic, pedestrians, weather, and sensors. It allows configuration of diverse sensors such as LIDARs, cameras, depth sensors, and GPS. In this project, CARLA was used to generate environmental datasets and provide camera frame observations for the agent. An RGB-camera with 640 x 480 resolution and a 110-degree FOV was used, along with a collision sensor. To simulate traffic, 15 non-player characters (NPCs) were spawned. For each epoch, the environment was reset, and the car was spawned in a random position to enable learning in diverse situations. The agent could choose from six discrete actions: hard left-turn, soft left-turn, straight, soft right-turn, hard right-turn, and emergency stop.

> 2. Driving Policy (Reinforcement Learning, RL)

The project's goal was to develop a DDQN model so a car can cruise autonomously without collisions. A study by Khlifi et al. (2025) suggested a CNN+LSTM-based DDQN agent is a good approach. The workflow starts with capturing and preprocessing image frames from a front-facing camera. A CNN extracts spatial features (e.g., road structure, obstacles), which are then fed into an LSTM to capture temporal dependencies, enabling the agent to understand motion and context. The LSTM's output goes through fully connected layers to generate Q-values for each possible driving action. During training, DDQN logic, using a target network alongside the online network, updates the CNN, LSTM, and fully connected layers based on observed rewards to minimize overestimation bias. The final action is selected by picking the one with the highest Q-value.

CARLA_RL Architecture



A modified reward function was used to encourage both safety and speed. The function is expressed as:

Reward

Where Vt is the vehicle's speed and λ is a positive weight, set to 0.2 in this project.

> 3. DDQN Agent Implementation

The DDQN agent was implemented with the following key settings:

  • Discount Factor: 0.99, to prioritize long-term rewards.
  • Learning Rate: 2e-4 for the Adam optimizer.
  • Batch Size: 64.
  • Epsilon Greedy Exploration: Epsilon starts at 1.0, decays to a final value of 0.05 over 50,000 steps.
  • Loss Function: SmoothL1Loss, which is less sensitive to outliers than MSELoss.
  • Polyak Averaging: 5e-3, used for soft updates of the target network.

  • The DriverModel consists of a three-layer CNN for feature extraction, an LSTM network with a hidden size of 256 to process the sequential features, and a fully connected layer to output the Q-values for the six possible actions.

    Results

    All the result videos can be found in the following YouTube playlist:
    https://youtube.com/playlist?list=PLyd5R-snIdOv5p9GStKQkrjmlwmka8-Xo&si=ePVuansyVjbfEqf


    1. Initial driving status without model training and autopilot

    Before we started to train the model, it was important to check if our code works properly. Since we want to train an RL model, we need to randomly make the car agent start driving around. In the first video, we first test our agent without any training. As the video showed, the agent is just purely driving around and trying to maximize its speed as fast as possible. Since there is no goal point for the agent to drive, the agent is just driving recklessly by crashing into obstacles or driving in the wrong lane. It was also not following the traffic lights. This phenomenon caused the learning result to be unsatisfying even after training with a ton of epochs. Since there is no penalty in the reward function on violating the traffic lights and crossing lanes, the agent was just trying to not collide with other obstacles and driving recklessly. This is not what we want to see.




    2. Initial driving status with autopilot but without model training


    To deal with the problem mentioned in previous section, we introduce an autopilot feature provided by the CARLA simulator. The original driving status of introducing the autopilot feature can be seen in the second video. With no traffic on the road, the autopilot seems to be able to drive in the town. Although it sometimes collides with the obstacle on the road or violating the other lanes, it still can be able to finish its job.



    However, when it comes to adding traffic in the town as the third video showed, with more obstacles in the town, the higher the chance that the autopilot failed to drive properly. Therefore, we can now train the model based on the autopilot feature instead of purely train the model from a random state.




    The fourth video showed the initial training with autopilot feature in a nontraffic world. (Note that you can consider “autopilot” feature somehow act as a “pretrained weight” in model training, although it is not actually a machine learning model weight or deep learning model weight.)


    3. Halfway model training result with autopilot



    After training with 20000 frames, which is half the way of the total training, the agent can now be way better to drive in the town with traffic. Although it still sometimes runs into obstacles or other cars, there is a significant improvement compared to the original autopilot. The fifth video showed the result of halfway training.


    4. Final model training result with autopilot

    The final model of the training with 40000 frames is quite satisfying. The agent is now able to drive safely in the town in both situations – with traffic and without traffic. There is no collision that occurs and the agent can drive smoothly without invading the other lanes. The sixth video (left one) showed the agent of final model driving in a nontraffic world while the seventh video (right one) showed the agent of final model driving in a traffic world.

    Driving in a nontraffic world

    Driving in a traffic world


    Summary

    Initially, without any training, the agent drove recklessly, aiming only to maximize speed, resulting in frequent crashes and traffic violations. This approach led to unsatisfying learning results. To address this, CARLA's built-in autopilot was introduced as a baseline guide. While the autopilot could navigate an empty town, its performance degraded significantly with traffic. The model was then trained using the autopilot as a starting point. After 20,000 training frames, the agent showed significant improvement, although it still had occasional collisions. The final model, after 40,000 frames, was quite satisfying, enabling the agent to drive smoothly and safely in environments both with and without traffic, avoiding collisions and adhering to lanes.

    The trained agent demonstrated a marked improvement over both its untrained state and the baseline CARLA autopilot. It successfully navigated smoothly, avoided collisions, and maintained lane discipline in various scenarios. However, there is room for improvement. The current six-action discrete control could be modified to be linear for smoother driving. Additionally, the reward function could be adjusted to penalize stopping, as the agent sometimes stops indefinitely behind a stalled vehicle to avoid a collision.

    Reference

    > For full report, please visit: https://drive.google.com/file/d/1D8MMZFLiElux2fCvvzgI47pYHE62upQL/view?usp=sharing

    > For project code, please visit: https://github.com/htliang517/CARLA_RL

    Want To Know More About Me ?